thoughtful feedback and comment
We would like to thank the reviewers for your thoughtful feedback and comments which would undoubtedly make the
We will update our paper to reflect your comments, fix typos and include missing references. We will update the paper to make this more overt. Eq. 4 is therefore chosen Both Eq. 3 and 4 are motivated by the policy improvement theorem. Whereas Eq. 3 seeks to improve the policy by choosing a better action to copy, Eq. 4 does this in a soft manner. R2 - reproducibility: We have open-sourced the code for CRR on Github and the link will be made available.